我们通过将其基于实现功能空间而不是参数空间的几何形状来系统地研究深度神经网络景观的方法。将分类器分组到等效类中,我们开发了一个标准化的参数化,其中所有对称性都被删除,从而导致环形拓扑。在这个空间上,我们探讨了误差景观而不是损失。这使我们能够得出有意义的概念,即最小化器的平坦度和连接它们的地球通道的概念。使用不同的优化算法,这些算法采样具有不同平坦度的最小化器,我们研究模式连接性和相对距离。测试各种最先进的体系结构和基准数据集,我们确认了平面度和泛化性能之间的相关性;我们进一步表明,在功能空间中,minima彼此更近,并且连接它们的大地测量学的屏障很小。我们还发现,通过梯度下降的变体发现的最小化器可以通过由参数空间中的两个直线组成的零误差路径连接,即带有单个弯曲的多边形链。我们观察到具有二进制权重和激活的神经网络中相似的定性结果,这为在这种情况下的连通性提供了第一个结果之一。我们的结果取决于对称性的去除,并且与对简单浅层模型进行的一些分析研究所描述的丰富现象学非常吻合。
translated by 谷歌翻译
当前的深度神经网络被高度参数化(多达数十亿个连接权重)和非线性。然而,它们几乎可以通过梯度下降算法的变体完美地拟合数据,并达到预测准确性的意外水平,而不会过度拟合。这些是巨大的结果,无视统计学习的预测,并对非凸优化构成概念性挑战。在本文中,我们使用来自无序系统的统计物理学的方法来分析非凸二进制二进制神经网络模型中过度参数化的计算后果,该模型对从结构上更简单但“隐藏”网络产生的数据进行了培训。随着连接权重的增加,我们遵循误差损失函数不同最小值的几何结构的变化,并将其与学习和概括性能相关联。当解决方案开始存在时,第一次过渡发生在所谓的插值点(完美拟合变得可能)。这种过渡反映了典型溶液的特性,但是它是尖锐的最小值,难以采样。差距后,发生了第二个过渡,并具有不同类型的“非典型”结构的不连续外观:重量空间的宽区域,这些区域特别是解决方案密度且具有良好的泛化特性。两种解决方案共存,典型的解决方案的呈指数数量,但是从经验上讲,我们发现有效的算法采样了非典型,稀有的算法。这表明非典型相变是学习的相关阶段。与该理论建议的可观察到的现实网络的数值测试结果与这种情况一致。
translated by 谷歌翻译
深度学习的成功揭示了神经网络对整个科学的应用潜力,并开辟了基本的理论问题。特别地,基于梯度方法的简单变体的学习算法能够找到高度非凸损函数的近最佳最佳最小值,是神经网络的意外特征。此外,这种算法即使在存在噪声的情况下也能够适合数据,但它们具有出色的预测能力。若干经验结果表明了通过算法实现的最小值的所谓平坦度与概括性性能之间的可再现相关性。同时,统计物理结果表明,在非透露网络中,多个窄的最小值可能与较少数量的宽扁平最小值共存,这概括了很好。在这里,我们表明,从“高边缘”(即局部稳健的)配置,从最小值的聚结会出现宽平坦的结构。尽管与零保证金相比具有呈指数稀有的稀有性,但高利润最小值倾向于集中在特定地区。这些最小值又被较小且较小的边距的其他解决方案包围,导致长距离的溶液区域密集。我们的分析还提供了一种替代分析方法,用于估计扁平最小值,当算法开始找到解决方案时,随着模型参数的数量变化。
translated by 谷歌翻译
在神经网络的经验风险景观中扁平最小值的性质已经讨论了一段时间。越来越多的证据表明他们对尖锐物质具有更好的泛化能力。首先,我们讨论高斯混合分类模型,并分析显示存在贝叶斯最佳点估算器,其对应于属于宽平区域的最小值。可以通过直接在分类器(通常是独立的)或学习中使用的可分解损耗函数上应用最大平坦度算法来找到这些估计器。接下来,我们通过广泛的数值验证将分析扩展到深度学习场景。使用两种算法,熵-SGD和复制-SGD,明确地包括在优化目标中,所谓的非局部平整度措施称为本地熵,我们一直提高常见架构的泛化误差(例如Resnet,CeffectnNet)。易于计算的平坦度测量显示与测试精度明确的相关性。
translated by 谷歌翻译
Diffusion models have achieved justifiable popularity by attaining state-of-the-art performance in generating realistic objects from seemingly arbitrarily complex data distributions, including when conditioning generation on labels. Unfortunately, however, their iterative nature renders them very computationally inefficient during the sampling process. For the multi-class conditional generation problem, we propose a novel, structurally unique framework of diffusion models which are hierarchically branched according to the inherent relationships between classes. In this work, we demonstrate that branched diffusion models offer major improvements in efficiently generating samples from multiple classes. We also showcase several other advantages of branched diffusion models, including ease of extension to novel classes in a continual-learning setting, and a unique interpretability that offers insight into these generative models. Branched diffusion models represent an alternative paradigm to their traditional linear counterparts, and can have large impacts in how we use diffusion models for efficient generation, online learning, and scientific discovery.
translated by 谷歌翻译
The polynomial kernels are widely used in machine learning and they are one of the default choices to develop kernel-based classification and regression models. However, they are rarely used and considered in numerical analysis due to their lack of strict positive definiteness. In particular they do not enjoy the usual property of unisolvency for arbitrary point sets, which is one of the key properties used to build kernel-based interpolation methods. This paper is devoted to establish some initial results for the study of these kernels, and their related interpolation algorithms, in the context of approximation theory. We will first prove necessary and sufficient conditions on point sets which guarantee the existence and uniqueness of an interpolant. We will then study the Reproducing Kernel Hilbert Spaces (or native spaces) of these kernels and their norms, and provide inclusion relations between spaces corresponding to different kernel parameters. With these spaces at hand, it will be further possible to derive generic error estimates which apply to sufficiently smooth functions, thus escaping the native space. Finally, we will show how to employ an efficient stable algorithm to these kernels to obtain accurate interpolants, and we will test them in some numerical experiment. After this analysis several computational and theoretical aspects remain open, and we will outline possible further research directions in a concluding section. This work builds some bridges between kernel and polynomial interpolation, two topics to which the authors, to different extents, have been introduced under the supervision or through the work of Stefano De Marchi. For this reason, they wish to dedicate this work to him in the occasion of his 60th birthday.
translated by 谷歌翻译
This paper presents the development of a system able to estimate the 2D relative position of nodes in a wireless network, based on distance measurements between the nodes. The system uses ultra wide band ranging technology and the Bluetooth Low Energy protocol to acquire data. Furthermore, a nonlinear least squares problem is formulated and solved numerically for estimating the relative positions of the nodes. The localization performance of the system is validated by experimental tests, demonstrating the capability of measuring the relative position of a network comprised of 4 nodes with an accuracy of the order of 3 cm and an update rate of 10 Hz. This shows the feasibility of applying the proposed system for multi-robot cooperative localization and formation control scenarios.
translated by 谷歌翻译
Steerable convolutional neural networks (CNNs) provide a general framework for building neural networks equivariant to translations and other transformations belonging to an origin-preserving group $G$, such as reflections and rotations. They rely on standard convolutions with $G$-steerable kernels obtained by analytically solving the group-specific equivariance constraint imposed onto the kernel space. As the solution is tailored to a particular group $G$, the implementation of a kernel basis does not generalize to other symmetry transformations, which complicates the development of group equivariant models. We propose using implicit neural representation via multi-layer perceptrons (MLPs) to parameterize $G$-steerable kernels. The resulting framework offers a simple and flexible way to implement Steerable CNNs and generalizes to any group $G$ for which a $G$-equivariant MLP can be built. We apply our method to point cloud (ModelNet-40) and molecular data (QM9) and demonstrate a significant improvement in performance compared to standard Steerable CNNs.
translated by 谷歌翻译
The development and adoption of artificial intelligence (AI) technologies in space applications is growing quickly as the consensus increases on the potential benefits introduced. As more and more aerospace engineers are becoming aware of new trends in AI, traditional approaches are revisited to consider the applications of emerging AI technologies. Already at the time of writing, the scope of AI-related activities across academia, the aerospace industry and space agencies is so wide that an in-depth review would not fit in these pages. In this chapter we focus instead on two main emerging trends we believe capture the most relevant and exciting activities in the field: differentiable intelligence and on-board machine learning. Differentiable intelligence, in a nutshell, refers to works making extensive use of automatic differentiation frameworks to learn the parameters of machine learning or related models. Onboard machine learning considers the problem of moving inference, as well as learning, onboard. Within these fields, we discuss a few selected projects originating from the European Space Agency's (ESA) Advanced Concepts Team (ACT), giving priority to advanced topics going beyond the transposition of established AI techniques and practices to the space domain.
translated by 谷歌翻译
The term ``neuromorphic'' refers to systems that are closely resembling the architecture and/or the dynamics of biological neural networks. Typical examples are novel computer chips designed to mimic the architecture of a biological brain, or sensors that get inspiration from, e.g., the visual or olfactory systems in insects and mammals to acquire information about the environment. This approach is not without ambition as it promises to enable engineered devices able to reproduce the level of performance observed in biological organisms -- the main immediate advantage being the efficient use of scarce resources, which translates into low power requirements. The emphasis on low power and energy efficiency of neuromorphic devices is a perfect match for space applications. Spacecraft -- especially miniaturized ones -- have strict energy constraints as they need to operate in an environment which is scarce with resources and extremely hostile. In this work we present an overview of early attempts made to study a neuromorphic approach in a space context at the European Space Agency's (ESA) Advanced Concepts Team (ACT).
translated by 谷歌翻译